-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-26652][SQL] Use Proleptic Gregorian Calendar in Literal.fromString #23596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| checkEvaluation(Literal.fromString("Databricks", StringType), "Databricks") | ||
| val dateString = "1970-01-01" | ||
| checkEvaluation(Literal.fromString(dateString, DateType), java.sql.Date.valueOf(dateString)) | ||
| val timestampString = "0000-01-01 00:00:00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The year 0000 did not exist in our current era, and it is considered as an invalid year by java.time implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean a behavior change introduced by SPARK-26178, SPARK-26243, SPARK-26424 in 3.0.0? I'm expecting that you already documented that before changing this test case.
it is considered as an invalid year by java.time implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it is bug fix. I guess old (current) implementation wasn't strong enough in checking its input.
Year zero does not exists in current (common) era (CE) and before common era (BCE) as well (see https://en.wikipedia.org/wiki/Year_zero).
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine but let me leave it to @hvanhovell
| def fromString(str: String, dataType: DataType): Literal = { | ||
| def parse[T](f: UTF8String => Option[T]): T = { | ||
| f(UTF8String.fromString(str)).getOrElse { | ||
| throw new AnalysisException(s"Cannot parse the ${dataType.catalogString} value: $str") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just throw a ParseException here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks this can be called somewhere else not by the parser .. in that case, parse exception might not make much sense. It maybe has to be illigal argument exception or runtime exception as well.
| /** | ||
| * Constructs a Literal from a String | ||
| */ | ||
| def fromString(str: String, dataType: DataType): Literal = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait actually, is it ever properly called somewhere previously? If not, we could just remove it. Since all classes in catalyst are considered an internal API, we could just remove if that's not called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I remember you mentioned this method can be used somewhere like logging. If not, I will remove it. Also cc @gatorsmile as you added tests for the method recently: #22345
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an explicit plan to use this somewhere? If so, that's okay.
Otherwise, let's remove. In general, we shouldn't keep the codes for internal purpose that are not being called. Removed codes remaining in the commit and other branches - we can always revive them when it's actually used or in order to deduplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it now. It was there for TreeNode.fromJSON, which has been removed long time ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Here is the PR to remove the methods: #23603
|
Test build #101441 has finished for PR 23596 at commit
|
| case "TIMESTAMP" => | ||
| val timeZone = getTimeZone(SQLConf.get.sessionLocalTimeZone) | ||
| toLiteral(stringToTimestamp(_, timeZone), TimestampType) | ||
| case "DATE" => toLiteral(DateType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just do Cast(Literal(str), DateType)? Or Literal.create(Cast(...).eval, DateType)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that in another PR but @hvanhovell doesn't like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks the current way is cleaner (see 016c696). I'm not sure where he said that in the PR though.
## What changes were proposed in this pull request? The `fromString` and `fromJSON` methods of the `Literal` object are removed because they are not used. Closes apache#23596 Closes apache#23603 from MaxGekk/remove-literal-fromstring. Authored-by: Maxim Gekk <maxim.gekk@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Timestamp.valueOfandDate.valueOfin parsingTimestampTypeandDateTypeliteral values inLiteral.fromStringsince the method uses the hybrid calendar (Julian+Gregorian) internally.stringToDateandstringToTimestampbecause they have been already ported on Proleptic Gregorian calendar which is required by SQL standard.Literal.fromStringfromAstBuilderin parsingTimestampandDateliteral values.How was this patch tested?
The changes were tested by
ExpressionParserSuiteandLiteralExpressionSuite.